Extracts BitString from code point to BitString class#723
Extracts BitString from code point to BitString class#723mward-sudo wants to merge 8 commits intobartblast:devfrom
Conversation
…ds parameter validation
📝 WalkthroughWalkthroughAdded UTF-8 utilities to the Bitstring class for decoding/encoding code points and validating UTF-8 sequences. Refactored unicode.mjs to use these new Bitstring helpers instead of custom UTF-8 validation and codepoint extraction logic. Added comprehensive test coverage for the new utilities. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
assets/js/bitstring.mjs (1)
259-259: Allocating lookup-table objects per call in hot-path methods.Both
decodeUtf8CodePoint(line 259) andisValidUtf8CodePoint(line 630) recreate a small object literal on every invocation. These are called in the O(n) scan insidegetValidUtf8Length, once per multi-byte sequence.The
firstByteMasksvalues follow the pattern0x7f >> length, which avoids the allocation entirely, andminValueForLengthcan be a module-level constant array.♻️ Zero-allocation alternatives
static decodeUtf8CodePoint(bytes, start, length) { if (length === 1) return bytes[start]; - // First byte masks: 2-byte=0x1f, 3-byte=0x0f, 4-byte=0x07 - const firstByteMasks = {2: 0x1f, 3: 0x0f, 4: 0x07}; - - let codePoint = bytes[start] & firstByteMasks[length]; + // First byte masks: 2→0x1f, 3→0x0f, 4→0x07 (formula: 0x7f >> length) + let codePoint = bytes[start] & (0x7f >> length); for (let i = 1; i < length; i++) {For
isValidUtf8CodePoint, hoistminValueForLengthto a module-level (or class-level) constant:+// Module-level constant; indices 0–4, only 1–4 used. +const UTF8_MIN_CODE_POINT = [0, 0, 0x80, 0x800, 0x10000]; static isValidUtf8CodePoint(codePoint, encodingLength) { - const minValueForLength = {1: 0, 2: 0x80, 3: 0x800, 4: 0x10000}; - if (codePoint < minValueForLength[encodingLength]) return false; + if (codePoint < UTF8_MIN_CODE_POINT[encodingLength]) return false;Also applies to: 630-630
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@assets/js/bitstring.mjs` at line 259, The code repeatedly allocates small lookup objects inside hot-path functions decodeUtf8CodePoint and isValidUtf8CodePoint (e.g. firstByteMasks and minValueForLength); hoist these to module-level constants and replace the firstByteMasks literal with the computed pattern (use 0x7f >> length) or a precomputed array to avoid per-call object creation, and make minValueForLength a shared constant array used by both functions.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@assets/js/bitstring.mjs`:
- Line 259: The code repeatedly allocates small lookup objects inside hot-path
functions decodeUtf8CodePoint and isValidUtf8CodePoint (e.g. firstByteMasks and
minValueForLength); hoist these to module-level constants and replace the
firstByteMasks literal with the computed pattern (use 0x7f >> length) or a
precomputed array to avoid per-call object creation, and make minValueForLength
a shared constant array used by both functions.
Closes #720
Dependencies
Please note that this PR includes commits from the PR(s) it is dependent upon. Once the dependent PR(s) are merged to the dev branch, then this PR will be rebased and will then only contain its own commits. This PR will remain in draft until that point.
Summary by CodeRabbit
New Features
Refactor
Tests